Agent
The final policy is used to make the agent, which like the analytical agent, follows some rules.
-
The agent first looks at all valid moves and checks if any of them is a winning move, and if so, plays it.
-
Failing which, the agent checks if any of them is a winning move for the opponent in the next round, and if so, prevents it.
-
Failing which, the agent looks at the recommended move given by the final policy and plays it if it is valid.
final_policy = policies[10]
def rl_agent(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
winning_moves = [move for move in valid_moves if check_winning_move(obs, config, move, obs.mark)]
if winning_moves:
return winning_moves[0]
losing_moves = [move for move in valid_moves if check_winning_move(obs, config, move, 3 - obs.mark)]
if losing_moves:
return losing_moves[0]
col, _ = final_policy.predict(np.array(obs['board']).reshape(1, 6, 7))
is_valid = (obs['board'][int(col)] == 0)
if is_valid:
return int(col)
else:
return random.choice([col for col in range(config.columns) if obs.board[int(col)] == 0])
This is in fact the agent used for all the intermediate agents as well, when training the policies.